PiCo: High-performance data analytics pipelines in modern C++
نویسندگان
چکیده
منابع مشابه
A Formal Semantics for Data Analytics Pipelines
In this report, we present a new programming model based on Pipelines and Operators, which are the building blocks of programs written in PiCo, a DSL for Data Analytics Pipelines. In the model we propose, we use the term Pipeline to denote a workflow that processes data collections—rather than a computational process—as is common in the data processing community. The novelty with respect to oth...
متن کاملAchieving High Performance for Big Data Analytics
Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges that include scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we...
متن کاملExplaining Outputs in Modern Data Analytics
In this talk I will present our recent work on the design and implementation of a general framework for interactively explaining the outputs of modern data-parallel computations, including iterative data analytics. To produce explanations, existing works adopt a naive backward tracing approach which runs into known issues; naive backward tracing may identify: (i) too much information that is di...
متن کاملBuilding Data-pipelines for High Performance Bulk Data Transfers in a Heterogeneous Grid Environment
The drastic increase in the data requirements of scientific applications combined with an increasing trend towards collaborative research has resulted in the need to transfer large amounts of data among the participating sites. The heterogeneous nature of the storage systems employed by the different sites makes transfer of data among them a difficult problem. The general tendency has been to e...
متن کاملHigh Performance Computing and Big Data Analytics – Paradigms and Challenges
The advent of technology has led to rise in data being captured, stored and analyzed. The requirement of improving the computational models along with managing the voluminous data is a primary concern. The transition of the High Performance Computing from catering to traditional problems to the newer domains like finance, healthcare etc. necessitates the joint analytical model to include Big Da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Future Generation Computer Systems
سال: 2018
ISSN: 0167-739X
DOI: 10.1016/j.future.2018.05.030